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The complete genome of a novel coronavirus was sequenced directly from the cloacal swab of a Canada 
goose that perished in a die-off of Canada and Snow geese in Cambridge Bay, Nunavut, Canada. 
Comparative genomics and phylogenetic analysis indicate it is a new species of Gammacoronavirus, 
as it falls below the threshold of 90% amino acid similarity in the protein domains used to demarcate 
Coronaviridae. Additional features that distinguish the genome of Canada goose coronavirus include 6 
novel ORFs, a partial duplication of the 4 gene and a presumptive change in the proteolytic processing 
of polyproteins la and lab. 


Viruses belonging to the Coronaviridae family have a single stranded positive sense RNA genome of 
26-31 kb. Members of this family include both human pathogens, such as severe acute respiratory syn¬ 
drome virus (SARS-CoV) 1 , and animal pathogens, such as porcine epidemic diarrhea virus 2 . Currently, the 
International Committee on the Taxonomy of Viruses (ICTV) recognizes four genera in the Coronaviridae 
family: Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus. While the reser¬ 
voirs of the Alphacoronavirus and Betacoronavirus genera are believed to be bats, the Gammacoronavirus and 
Deltacoronavirus genera have been shown to spread primarily through birds 3 . The first three species of the 
Deltacoronavirus genus were discovered in 2009 4 and recent work has vastly expanded the Deltacoronavirus 
genus, adding seven additional species 3 . 

By contrast relatively few species within the Gammacoronavirus genus have been identified. There are cur¬ 
rently two recognized species in the Gammacoronavirus genus: avian coronavirus (ACoV) and beluga whale coro¬ 
navirus SW1 (SW1). ACoVs infect multiple avian hosts and include several important poultry pathogens, such as 
infectious bronchitis virus (IBV) and turkey coronavirus (TCoV) 5 . IBV was first described in the United States 6 
but has since been described around the globe 7 . Turkey Coronavirus is the cause of acute enteritis in domestic 
turkeys 8 . The second species in the Gammacornavirus genus SW1 was first discovered in beluga whales 9 but has 
since been detected in other cetaceans, such as Indo-Pacific bottlenose dolphins 10 . Despite IBV being the first 
discovered coronavirus and the impact it has on the poultry industry 11 , the number of identified species within 
the Gammacoronavirus genus remains small in comparison to the other coronavirus genera. Coronaviruses from 
several other avian hosts for which partial sequences are available suggest relatedness to IBV and TCoV. These 
viruses, which include goose coronavirus (GCoV), were tentatively classified as part of the ACoV species. An 
approximately 3 kb region, including the nucleocapsid gene and several accessory genes, of GCoV were previously 
sequenced from a greylag goose in Norway A 

Here we present the full genome of Canada goose coronavirus (CGCoV) sequenced directly from the cloacal 
swab of a Canada goose, which expired in a mass die-off in a remote region near the arctic in Nunavut, Canada. 
Our analyses demonstrate that it should be classified as a novel species in the Gammacoronavirus genus. 
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Figure 1. Genome organization of Canada goose coronavirus. Purple indicates untranslated regions, blue 
indicates putative proteins, green indicates coding region of mature non-structural proteins (NSP) and red 
indicates transcription regulatory sequences (TRS). The stem loop-like motif and octamer motif are contained 
within the 3' UTR. Genome organization figure was constructed using Geneious™ (Biomatters, v 9.1.8). 
indicate ACoV 4b homologues. Proteins are named numerically from the 3' end of the genome, with the 
exception of the structural genes, which are denoted by their common names. 


Results and Discussion 

Due to the remote location of the die off, samples from the dead birds were not collected immediately and sent to 
a diagnostic laboratory until severe predation and decomposition had occurred. The poor sample quality, in addi¬ 
tion to the difficulty of coronavirus isolation, led to the failure to isolate infectious virus using standard methods. 
However, the complete genome of a novel gammacoronavirus was assembled from high throughput sequencing 
reads derived from the cloacal swab of a single Canada goose. The assembled genome of the novel Canada goose 
coronavirus (CGCoV) is 28,539 nts in length (excluding the poly(A) tail) and has 38.4% GC-content. The genome 
of CGCoV is approximately 1000 nts longer than the reference genomes for ACoV available in GenBank. The 
genome organization of CGCoV is presented in Fig. 1. The 5' UTR of CGCoV is 553 nt in length and contains a 
higher GC content (48.3%) relative to the genome as a whole. The 3' UTR of CGCoV shares only 68% pairwise 
identity with that of duck coronavirus (DCoV) and 47.5% pairwise identity to that of SW1. Like all coronavirus 
genomes reported to date, CGCoV’s genome is dominated by the coding regions for the large polyproteins la 
and lab, followed by the structural and accessory genes. The heptanucleotide slippery sequence UUUAAAC, 
associated with the ribosomal slippage that produces polyprotein lab, was present at nt positon 11,995. CGCoV’s 
genome contains genes for all four structural proteins common to coronaviruses; spike (S), envelope (E), mem¬ 
brane (M) and nucleocapsid (N). In addition, CGCoV contains 10 open reading frames (ORFs) predicted to 
encode accessory proteins. The order of the structural and accessory protein-coding ORFs in CGCoV resembles 
that of ACoV, but there are notable differences. The general genome organization of ACoV is lab-S-3a-3b-E-M- 
4b-4c-5a-5b-N-6b 13 . However, there is some variance in the genome organization within the ACoV species. For 
example, Australian IBV strains lack ORFs 4a, 4b and 5b 14 . Overall, CGCoV contains a larger number (n = 14) of 
ORFs coding for predicted accessory and structural proteins downstream of the polyprotein lab coding region. 
Two additional ORFs (7a and 7b) are found between the CGCoV M and N ORFs. There are also two additional 
ORFs (10 and 11) following the N gene. While some ACoVs do have ORFs following the N gene, ORFs 10 and 
11 in CGCoV do not share obvious homology to those of IBV and TCoV. The 3' UTR of CGCoV is 301 nucleo¬ 
tides in length and contains the stem loop-like motif 113 bp upstream from the poly(A) tail. This stem loop-like 
motif was first identified in astroviruses 15 but is also present in ACoVs and SARS-CoV 3 . Further downstream in 
the 3' UTR, the octanucleotide motif (GGAAGAGC) is found 71 bp upstream of the poly(A) tail. The 3' UTR of 
CGCoV shares 98% pairwise identity to the partially sequenced GCoV and 84% pairwise identity to IBV. 

A trait suggesting common ancestry between CGCoV and ACoV is the canonical ACoV transcription regu¬ 
latory sequence (TRS) found at the end of the leader sequence in CGCoV. The TRS of CGCoV is identical to that 
identified by Cao et al. (2008) as the TRS of TCoV (CTTAACAAA). Body TRS’s regulate viral gene expression 
by forming a complex with the leader TRS, causing discontinuous transcription of mRNA 16 . Ten putative body 
TRSs were found in the 3' end of the CGCoV genome (Fig. 1). Four of the ten putative TRSs (4, 6, 8, 9) were 
exact matches to the canonical leader TRS. Three TRSs (2, 7, 11) contained one mismatch and the remaining 
three TRSs (3, 5, 10) contained two mismatches to the leader TRS. The functionality of these TRSs would need 
to be experimentally determined; however, previous studies have shown that TRSs of ACoVs are subject to some 
variation 13,17 . CGCoV contains twice the number of TRS’s as ACoVs and a similar number compared to the nine 
contained in SW1 9 . Table 1 demonstrates the nucleotide distances between the TRS and the start codon of ORFs 
found in CGCoV’s, which are comparable to those of TCoV 3 . 

The start codon of CGCoV’s polyprotein lab is located 567 nucleotides downstream of the leader TRS. The 
coronavirus polyprotein lab is cleaved into 15-16 non-structural proteins (NSPs) by two viral proteases 18 . 
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Protein 

Top Match in NCBI 

Top match - aa % 
identity* 

Size 

(aa) 

Distance between TRS 
and start codon (nt) 

la 

la-infectious bronchitis virus strain B1648 

43 

3825 

480 

lab 

lab-Infectious bronchitis virus strain ck/CH/LJL/05I 

57 

6510 

480 

Spike 

Spike-Infectious bronchitis virus strain N2-75 

53 

1184 

82 

3 

n/a 

n/a 

53 

0 

4a 

n/a 

nidi 

55 

3 

Envelope 

Envelope-Infectious bronchitis virus strain IS-1494 

69 

100 

n/a 

Membrane 

Membrane-Duck Coronavirus isolate DK/GD/2014 

72 

235 

74 

5b 

4b-Infectious bronchitis virus strain Georgia 1998 Vaccine 

41 

88 

n/a 

6 

n/a 

nidi 

63 

5 

7a 

4b-Duck Coronavirus isolate DK/GD/2014 

23 

92 

3 

7b 

n/a 

n/a 

69 

n/a 

8a 

5a-Duck Coronavirus isolate DK/GD/2014 

37 

65 

4 

8b 

5b-Duck Coronavirus isolate DK/GD/2014 

46 

85 

n/a 

Nucleocapsid 

Nucleocapsid-Goose Coronavirus 

94 

414 

94 

10 

ORFxg-Goose Coronavirus 

92 

97 

0 

11 

ORFyg-Goose Coronavirus 

81 

180 

91 


Table 1 . Putative viral proteins of Canada goose coronavirus. *Matches below 20% coverage not shown. 


Protein 

CGCoV 

TCoV 

IBV 

SW1 

Cleavage site 

Size aa 

Cleavage site 

Size aa 

Cleavage site 

Size aa 

Cleavage site 

Size aa 

NSP1/2 

AG A GH 

609 

AG A GK 

673 

AG A GK 

673 

VD A GD 

636 

NSP3 

AG A GV 

1532 

AG A GV 

1594 

AG A GI 

1592 

LG A GV 

1586 

NSP4 

LQ A AG 

503 

LQ A AG 

514 

LQ A SG 

514 

LQ A AG 

537 

NSP5 

LQ A SN 

307 

LQ A SS 

307 

LQ A SS 

307 

LQ A SN 

303 

NSP6 

VQ A SK 

295 

VQ A SK 

297 

VQ A AK 

293 

VQ A SK 

303 

NSP7 

LQ A AV 

83 

LQ A SV 

83 

LQ A SV 

83 

LQ A AV 

83 

NSP8 

LQ A NN 

212 

LQ A NN 

210 

LQ A NN 

210 

LQ A NN 

198 

NSP9 

LQ A GK 

111 

LQ A SK 

111 

LQ A SK 

111 

LQ A HG 

112 

NSP 10 

SRFV* 

173 

VQ A SA 

145 

VQ A SV 

145 

LQ A SV 

189 

NSP 11 

— 

— 

— 

23 

— 

23 

— 

17 

NSP 12 

SRFV* 

1101 

VQ A SA 

941 

VQ A SV 

940 

LQ A SV 

926 

NSP 13 

LQ A SC 

599 

LQ A SC 

601 

LQ A SC 

600 

LQ A AS 

601 

NSP14 

LQ A SN 

522 

LQ A GT 

521 

LQ A GT 

514 

LQ A SQ 

528 

NSP 15 

LQ A SI 

338 

LQ A SI 

338 

LQ A SI 

338 

LQ A SL 

349 

NSP16 

LQ A SG 

298 

LQ A SA 

302 

LQ A SA 

302 

LQ A SD 

312 


Table 2. Non-structural proteins size and cleavage site of gammacoronaviruses. * Amino acids present in 
CGCoV where putative protease cleavage sites were observed in TCoV, IBV and SW1. 


Putative cleavage sites for these proteases are present in CGCoV s la and lab polyproteins, with the exception of 
the NSP 10/11 (polyprotein la) and NSP 10/12 (polyprotein lab) cleavage sites. The missing cleavage site would 
be located near the end of polyprotein la, producing the NSPs 10 and 11, and also in the alternatively transcribed 
polyprotein lab, producing NSPs 10 and 12. The absence of the NSP10/11 and 10/12 protease recognition site was 
confirmed with Sanger sequencing. With the exception of the missing cleavage sites, the putative cleavage sites 
would produce NSPs of sizes congruent with other Gammacoronavirus species (Table 2). No Gammacoronavirus 
species to date, including CGCoV, have a papain-like protease cleavage site between NSP 1-2 19 . 

While the genome structure of CGCoV resembles that of ACoV, there are some notable differences. For 
example, there are no homologues to ACoV’s 3a or 3b accessory proteins in CGCoV, a trait shared with SW1. 
Furthermore, CGCoV has a number of ORFs that do not appear to have homologues in other sequenced 
Gammacoronavirus species, such as the ORFs for putative proteins 3 and 4a (Fig. 1). These two ORFs are found 
in CGCoV in the corresponding location of ACoV’s 3a and 3b ORFs (between the S and E ORFs) and are also 
similar in size to ACoV’s 3a and 3b proteins. However, they share no obvious sequence similarity with any 3a or 
3b gene, or any other entry in NCBI (Table 1). ACoV’s 3a and 3b proteins have been shown to be unnecessary for 
replication 20 , however knock-out mutants for these accessory genes are attenuated V The IBV’s 3 gene is function¬ 
ally tricistronic, meaning the 3a, 3b and E proteins are under the control of a single TRS 22,23 . This is not the case 
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IBV B1648 KR231009 
IBV UY/11/CA/18 MF421320 
IBV UY/09/CA/01 MF421319 
IBV Ck/ltaly/6241/96 MG021194 
IBV Arkansas GQ504720 
IBV Gray GU393334 
IBV Mass41 GQ504725 
IBV Ck/EG/CU/1/2014 KY805845 
IBV Massachusetts GQ504724 
IBV FL18288 GU393333 
IBV Holte GU393336 
IBV Iowa 97 GU393337 
IBV VicS-v KF460437 
— IBV Cal56b GU393331 
r IBV BP-CaKII MF924724 
P IBV K2 MF924725 
1— DCoV DK/CH/HN/ZZ2004 JF705860 
— r— IBV CK/CH/2010/JT-1 KU361187 
r IBVYX10JX840411 
(- IBV CK/CH/SD/121220 KJ128295 
IBV ck/ZA/3665/11 KP662631 
IBV AR251-15 KX272465 
IBV 1148-A KY933089 
IBV SCZJ-2 KX721498 

- IBV GZ14 F80 vaccine MG517474 
IBV Ck/EG/CU/4/2014 KY805846 

- IBV Ck/Aus/N 1/88 KU556804 
IBV Ck/Aus/N 1/08 KU556807 
-IBV CK/CH/HB/2016 MF882923 


If- 

l IR 


r IBV Georgia 1998 pass8 GQ504722 
“L IBV Georgia 1998 GQ504723 

-CGCoV* 



TCoV VA-74/03 GQ427173 
TCoV MG10NC 010800 
TCoV IN-517/94 GQ427175 
TCoV TX-1038/98 GQ427176 
TCoV TX-GL/01 GQ427174 
L- TCoV 540 EU022525 

- TCoV ATCC EU022526 

Guinae fowl GfCoV/FR/2011 LN610099 

- DCoV DK/GD/27/2014 KM454473 

- TCoV 080385d KR822424 
-PCoV/LC364344 


TCoV 


T CoV_TX-GL/01 _GQ427174 
IBV_Georgia 1998_GQ504723 

- IBV_Arkansas_GQ504720 

- TCoV_VA-74/03_GQ427173 

- TCoV_IN-517/94_GQ427175 

-IBV_Gray_GU393334 

IBV_Cal56b_GU393331 

IBV_FL18288_GU393333 
P— TCoV_ATCC_EU022526 
-1BV_B P-Ca KI l_M F924724 
IBV_K2_MF924725 
IBV_Georgia1998pass8_GQ504722 
I BV_Ck/EG/CU/1/2014_KY805845 
I BV_Mass41 _GQ504725 

C IBV_Massachusetts_GQ504724 
r IBV_Holte_GU393336 
1 IF 


l IR 


1 IBVJowa 97_GU393337 

-TCoV_540_EU022525 

-T CoV_TX-1038/98_GQ427176 

— T CoV_M G10_NC010800 
n— I BV_B 1648 KR231009 
'— IB V_Ck/EG/CU/4/2014_KY805846 

_r IBV_VicS-v_KF460437 

IBV_Ck/Aus/N 1 /08 KU556807 
• IB V_CK/C H/SD/121220_K J128295 
I BV_SCZJ-2_KX721498 

- IBV_1148-A KY933089 
V_GZ14 F80 vaccine_MG517474 

- DCov_DK/CH/HN/ZZ2004_JF705860 
IB VCK/CH/H B/2016_M F882923 

- IBV_YX10 JX840411 
- IBV_CK/CH/2010/JT-1_KU361187 

r- IBV_ck/ZA/3665/11_KP662631 

i-Guinae fowl_GfCoV/FR/2011_LN610099 

-T CoV_080385d_KR822424 

- IBV_AR251-15_KX272465 
■ IBV_Ck/ltaly/624l/96_MG021194 
r- IBV_UY/09/CA/01 _M F421319 
L- IBV_UY/11/CA/18_MF421320 
- DCoV_DK/GD/27/2014_KM454473 
-IBV Ck/Aus/N 1/88 KU556804 



ICGCoV* 


- PCoV UAE-HKU29 271F LC364344 


0.50 

Figure 2. The phylogeny of gammacoronavirus spike and nucleocapsid proteins. A maximum likelihood tree 
built, using the amino acid sequences of the spike protein (a) and nucleocapsid protein (b) domains aligned 
with ClustalW \ in MEGA X using the Jones-Taylor-Thornton (JTT) substitution model and 1000 bootstraps 32 . 
IBV Infectious Bronchitis virus, TCoV Turkey Coronavavirus, PCoV Pigeon Coronavirus, DCoV Duck 
Coronavirus. 


in CGCoV, as the E ORF of CGCoV shares a TRS with only the 4a ORF in CGCoV and 3 ORF is preceded by a 
separate TRS (Fig. 1). 

An additional TRS is also found in between CGCoV’s M and N ORFs, preceding the proteins 7a and 7b 
(Fig. 1). Commonly ACoV’s have two ORFs between the M and 5 genes, coding for the 4b and 4c accessory 
proteins. CGCoV contains 4 ORFs between the M and 8 gene (ACoV 5 gene homologue). Two of these ORFs 
(5b and 7a) are ACoV 4b homologues, likely the result of gene duplication. This area in IBV has been identified 
as a hotspot for recombination 24 . The region between the ACoV M and 5 gene was formally called the intergenic 
region because of the lack of a TRS. However, it was later shown that gene 4 is expressed using an alternative 
TRS in IBV 7 . Notably, one of the 4b homologs (i.e. 5b) in CGCoV does have a TRS (Fig. 1). The use of template 
switching at TRSs is thought to lend to recombination in coronaviruses 25 . The two CGCoV 4b homologs are not 
identical to each other (Table 1). Amino acid sequence identity to other 4b proteins is low for both CGCoV 4b 
homologues, 41% to IBV and 23% to DCoV respectively The gene 4 duplication was also confirmed by Sanger 
sequencing of the genomic region between the M ORF to the 8 gene. 

The ACoV 5a and 5b accessory proteins (8a and 8b in CGCoV) appear to be the only accessory proteins con¬ 
served in all 3 Gammacoronavirus species, although gene order differs. ORFs encoding putitive proteins 5a and 5b 
belong to the bicistronic gene 5 of ACoVs and are also unnecessary for replication 21 . To date, all publically avail¬ 
able sequence information suggest that Gammacoronavirus species have lost the NSP1 cleavage site. The function 
of NSP1 in alphacoronaviruses and betacoronaviruses is the inhibition of host protein production. Accessory 
protein 5a is shown to have adopted this function in place of NSP1 in IBV 9 . 

The majority of structural proteins of CGCoV also share low amino acid sequence identity (53-72%) with IBV 
and DCoV Phylogenetic analysis of the spike gene show that the CGCoV spike gene clusters with the IBV spike 
gene, separate from the TCoV cluster (Fig. 2a). Figure 2b also demonstrates the nucleocapsid gene of CGCoV 
is distantly related to those of ACoVs. However the CGCoV nucleocapid protein does share 94% amino acid 
sequence identity with the nucleocapsid protein encoded in the partially sequenced graylag GCoV genome 13 . In 
addition, ORFs 10 and 11, which are preceded by the nucleocapsid gene, also share high amino acid identity with 
graylag GCoV proteins, 92% and 81% respectively It should be noted that, among full and partial genomes of 
gammacoronaviruses sequenced to date, ORFs 10 and 11 seem to be unique to CGCoV and GCoV and are both 
preceded by a TRS, suggesting that these ORFs are very likely expressed. The fact that some CGCoV proteins 
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Porcine epidemic diarrhea virus (NC 003436) 

Scotophilus bat coronavirus SI 2 (NC 0096S7) 

Bat coronavirus HKU10 (NC 018871) 

Rhinotophus ferrumequinum alphacoronavirus HuB-2013 (NC 028814) 
Miniopterus bat coronavirus 1 (NC 010437) 

Nyctalus veiutinus alphacoronavirus SC-2013 (NC 028833) 

Miniopterus bat coronavirus HKU8 (NC 010438) 

Myotis ricketti alphacoronavirus Sax-2011 (NC 028811) 

Human coronavirus 229E (NC 002645) 

Human coronavirus NL63 (NC 005831) 

NL63-related bat coronavirus strain BtKYNL63-9b (NC 032107) 

Rhinotophus bat coronavirus HKU2 (NC 009988) 

Alphacoronavirus 1 (NC 002306) 

Ferret coronavirus (NC 030292) 

Mink coronavirus 1 (NC 023760) 

Lucheng Rn rat coronavirus (NC 032730) 

Betacoronavirus 1 (U00735) 

China Rattus coronavirus HKU24 (NC 026011) 

Human coronavirus HKU1 (NC 006577) 

Murine coronavirus (NC 001846) 

Bat Hp-betacoronavirus Zhejiang2013 (NC 025217) 

Severe acute respiratory syndrome-related coronavirus (NC 004718) 
Rousettus bat coronavirus CCCDC1 (NC 030886) 

Rousettus bat coronavirus HKU9 (NC 009021) 

Hedgehog coronavirus 1 (NC 039207) 

Middle East respiratory syndrome-related coronavirus (NC 019843) 
Pipistretlus bat coronavirus HKU5 (NC 009020) 

Tylonycteris bat coronavirus HKU4 (NC 009019) 

Avian coronavirus (NC 001451) 

Canada goose coronavirus 
Beluga whale coronavirus SW1 (NC 010646) 

Night heron coronavirus HKU19 (NC 016994) 

Wigeon coronavirus HKU20 (NC 016995) 

Common moorhen coronavirus HKU21 (NC 016996) 

Bulbul coronavirus HKUi 1 (NC 011547) Deltacoronavirus 

White-eye coronavirus HKU16 (NC 016991) 

Coronavirus HKU15 (NC 039208) 

Munia coronavirus HKUI 3 (NC 011550) 


Alphacoronavirus 


Betacoronavirus 


Gammacoronavirus 


Figure 3. The phylogeny of Canada goose coronavirus. A maximum likelihood tree built, using the 
concatenated amino acid sequences of the replicase and helicase protein domains aligned with ClustalW 31 , in 
MEGA X using the Jones-Taylor-Thornton (JTT) substitution model and 1000 bootstraps 32 . Numbers at nodes 
indicate the bootstrap value. 


share higher amino acid sequence similarity with the partial GCoV sequences available suggest these two viruses 
are more closely related to each other than to other gammacoronaviruses known to date. 

The phylogenetic tree built using the coding regions for the conserved replicase and helicase domains demon¬ 
strates that CGCoV clusters with gammacoronaviruses and shares a more recent common ancestor with ACoV 
than with the cetacean gammacoronaviruses (Fig. 3). Further comparisons suggest that CGCoV is a separate spe¬ 
cies from ACoV. Current taxonomy of Coronaviridae is determined using pairwise comparisons of the amino acid 
sequence of seven conserved domains in the lab polyprotein. Members of the same species share over 90% amino 
acid identity in these seven conserved domains 5 . Percent identity of CGCoV falls well below the 90% threshold set 
by ICTV with ACoV and SW1, suggesting CGCoV is a separate species (Table 3). Within Coronaviridae , CGCoV 
shares the highest homology (68%) in the 7 conserved domains to the gammacoronaviruses TCoV and DCoV. 

As the full genome was sequenced from only the cloacal swab of a single Canada goose, a screening PCR 
was designed based on the 4b duplication region unique to CGCoV and performed on all samples. The Sanger 
sequencing primers of the region between the M and 8 gene were used, as this area of the genome is specific to 
CGCoV. All samples were found to be positive, with the exception of the pharyngeal swab of the snow goose and 
the lung tissue of the second Canada goose which could not be tested as the sample was exhausted. Amplicons 
were Sanger sequenced and confirmed to match the CGCoV genome. High throughput sequencing conducted on 
RNA extracted from cloacal swabs from the second Canada goose and the snow goose also resulted in partial (64 
and 18%) genomes of the CGCoV. While this does not confirm the virus’s presence in all animals that perished in 
the die off, this shows CGCoV was present in all birds that were available for testing. Further studies will require 
the availability of an infectious virus to determine the pathogenicity of CGCoV and its ability to cause mortality 
in Canada geese and snow geese. 

To summarize, the complete genome of CGCoV, a novel Gammacoronavirus species was sequenced directly 
from the cloacal swab of a Canada goose associated with a mass die-off. The CGCoV genome was also detected 
in samples derived from a second Canada goose and a snow goose that perished in the die-off, using PCR, Sanger 
and high throughput sequencing. Comparative genomics and phylogenetic analysis indicate CGCoV clusters 
with ACoV but is a distinct Gammacoronavirus species. Interesting features of this new species include the 
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Domain 

aa % identity 
to IBV 

aa % identity 
to TCoV 

aa % identity 
to DCoV 

aa % Identity 
to SW1 

ADP-ribose-1 "-phosphatase 

42 

43 

38 

23 

3C-like Protease 

56 

58 

57 

49 

RdRp 

80 

80 

83 

69 

Helicase 1 

89 

90 

92 

78 

Exonuclease 

78 

72 

77 

56 

Endoribonuclease 

53 

53 

54 

41 

Ribose^'-O methyltransferase 

74 

77 

76 

65 

Average 

67 

68 

68 

54 


Table 3. Comparison of the amino acid pairwise identity of 7 conserved coronavirus domains in the poly lab 
protein of Canada goose coronavirus to other gammacoronaviruses. 


presence of two 4b homologues, a putative change in the proteolytic processing of the polyproteins la and lab, 
and six novel accessory genes. 

Methods 

Source of samples. A large die off of Canada and snow geese occurred in the fall of 2017 near the arctic in 
Cambridge Bay, Nunavut, Canada. Due to poor carcass quality and remote location, samples were only collected 
from two dead Canada geese and one Snow goose, all of which had undergone predation and decomposition. 
Cloacal and pharygenal swabs were collected from all three birds, lung tissue was collected from one Canada 
goose. Other organs were not present or were in extremely poor condition. Detection of both common avian 
pathogens, such as avian influenza and avian paramyxovirus by the National Reference Laboratory, by routine 
laboratory testing gave negative results. Virus isolation was performed by two serial passages in SPF chicken eggs 
using protocols prescribed by the World Organization for Animal Health (OIE) for the most closely related gam- 
macoronavirus, infectious bronchitis virus (IBV). Samples were then subjected to targeted sequence enrichment 26 
and next-generation sequencing on an Illumina MiSeq platform. 

Sample pre-treatment. Tissues were homogenized using a Precellys Evolution homogenizer (Bertin 
Instruments) according to the manufacturers instructions. Following a clarification by centrifugation at 3000 rpm 
for 10 minutes, nucleic acids were extracted using the MagMAX Pathogen RNA/DNA Kit (Ambion) according to 
the manufacturers instructions. 

cDNA synthesis was then performed using Superscript™ IV First-Strand Synthesis System (SSIV) 
(ThermoFisher) according to the manufacturer’s recommendation. A total of 11 uL of extracted total nucleic 
acid was mixed with dNTPS (10 mM) and a tagged random nonamer primer (40 uM) (GTT TCC CAG TCA 
CGA TAN NNN NNN NN). Samples were incubated at 65 °C for 5 minutes, and then placed on ice for 1 minute. 
A reagent mixture of 5x SSIV Buffer, Ribonuclease Inhibitor (40 U/pT), DTT (100 mM) and Superscript™ IV 
Reverse Transcriptase was then added. The samples were incubated for 10 minutes at 23 °C, 10 minutes at 50 °C 
and 10 minutes at 80 °C. 

Second strand synthesis was performed using Sequenase Version 2.0 DNA Polymerase (ThermoFisher) 
according to the manufacturer’s recommendation. The first strand synthesis product was incubated with 10 uL of 
Sequenase Version 2.0 DNA Polymerase diluted in 5x reaction buffer and nuclease free water. Samples were then 
heated to 37 °C over five minutes and incubated at 37 °C for 12 minutes, followed by 2 minutes at 95 °C. Samples 
were then cooled to 10 °C and 1.2 uL of Sequenase DNA polymerase in dilution buffer was added. Samples were 
again ramped to 37 °C over five minutes and incubated at 37 °C for 12 minutes, followed by 8 minutes at 95 °C. 
A total of 6 uL of the second strand synthesis product was then used as template for amplification. AccuPrime™ 
Taq DNA Polymerase (Thermofisher) was mixed with 10X AccuPrime™ PCR Buffer I, nuclease free water and 
a primer for the nonomer’s tag (100 uM). 30 cycles of PCR were then performed with the following parameters: 
30 seconds at 94 °C, 30 seconds at 40 °C, 30 seconds at 50 °C and 1 minute at 72 °C. cDNA/DNA mixtures were 
then cleaned with Genomic DNA Clean & Concentrator columns (Zymo Research) and eluted in 20 mM Tris 
(ThermoFisher). 

Library preparation and sequencing. Sequence libraries were prepared with the KAPA HyperPlus 
library kit (Roche). Sequence library construction and capture were carried out according to Nimblegen’s SeqCap 
EZ HyperCap Workflow User’s Guide vl. Samples were pooled in equal amounts by weight prior to capture. 
Sequencing was performed on an Illumina Miseq instrument in the National Centre for Foreign Animal Disease 
biocontainment level 3 sequencing facility A V2 flow cell was used with a 500 cycle reagent cartridge (Illumina). 

S' Race and Sanger sequencing. 5' RACE was used to obtain the missing leader sequence (52bp). 
The SMARTer 5' RACE and 3' RACE kit (Takarabio) was used according to the kit instructions. The gene 
specific primer used for 5' RACE was TCAGCTACAGTAGAGGGAGATGTCATAGGTGC. For Sanger 
sequencing, amplicons was performed using KAPA HiFi HotStart ReadyMixPCR Kit (KAPABiosystems). 
The primers CTAAAGAGAAGGTGGACACTGGT and CTAAGAATGCGAACTTCACAGAGC were 
used to amplify the gene 4b homologue region. The primers GTTGTTGTGTTACAAGGCAAGGG and 
GGATTATGATCAAACCATGAACCTGG were used to amplify the NSP 10/12 region. Cycling conditions 
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used to generate amplicon for Sanger sequencing were: 1 cycle: 95 °C for 3 minutes, 40 cycles: 98 °C for 20 sec¬ 
onds, 65 °C for 15 seconds, 72 °C for 2.5 minutes, and 1 cycle: 72 °C for 3 minutes. Amplicons were cleaned using 
AMPure XP beads (Beckman Coulter) according to the manufacturer’s directions. Sanger sequencing was per¬ 
formed on the ABI Genetic Analyzer 3130XL platform using the BigDye Terminator v3.1 Cycle Sequencing Kit 
(Applied Biosystems) according to the user manual. 

Bioinformatics. Read quality was assessed using FastQC and trimmed using Trimmamatic 2 (Version 0.36). 
Host reads were then filtered with RAMBO- K, using the only complete genome of a goose species (Anser cyg- 
noides) currently available and DCoV 3 . The near complete genome sequence of CGCoV was assembled from 
NGS derived sequences from a cloacal swab of one Canada goose using SPAdes 29 . Sanger reads were aligned to 
the draft genome in Geneious™ (Biomatters, v 9.1.8). Annotations were performed using Geneious and protein 
domains were identified using PFAM °. The Canada goose coronavirus genome is available under accession num¬ 
ber MK359255 on NCBI. 
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